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Abstract. In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for 
universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm), and assert that it is 
time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic 
science, coupled with an explosion of hydrologic data and advances in parameter estimation and modelling, have laid the 
foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize 
important scaling and similarity concepts (hypotheses) that require testing, describe a mutual information framework for 
testing these hypotheses, describe boundary condition, state/flux, and parameter data requirements across scales to support 
testing these hypotheses, and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We 
call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply 


this to outstanding challenges in scaling and similarity. 


1 Introduction 


This synthesis paper is an outcome of the “Symposium in Honor of Eric Wood: Observations and Modeling across Scales”, 
held June 2-3, 2016 in Princeton, New Jersey, USA. The focus of this contribution is the heterogeneity of hydrological 
processes, their organization, scaling and similarity, and the impact of the heterogeneity on water and energy states and 
fluxes (and vice versa). We argue here that the growth of hydrologic science, from empiricism (1 paradigm), via theory (2° 
paradigm), to computational simulation 3" paradigm) has yielded important advances in understanding and predictive 


capabilities — yet we argue that accelerating advances in hydrologic science will require us to embrace the 4S paradigm of 
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data-intensive science, to use emerging datasets to synthesize/scrutinize theories and models, and improve the data support 


for the mechanisms of Earth System change. 


The Fourth Paradigm is a concept that focuses on how science can be advanced by enabling full exploitation of data via new 
computational methods. The concept is based on the idea that computational science constitutes a new set of methods 
beyond empiricism, theory, and simulation, and is concerned with data discovery in the sense that researchers and scientists 
require tools, technologies, and platforms that seamlessly integrate into standard scientific methodologies and processes. By 
integrating these tools and technologies for research, we provide new opportunities for researchers and scientists to share and 
analyze data and thereby encourage new scientific discovery. As shown in Figure 1, the scientific method applied to 
hydrology is not a linear process—rather, because hydrology is already in the 3% paradigm, empiricism (the 1“ paradigm) 
and theoretical development (the hs paradigm) both lead to new theories and hypotheses that are embodied in computational 
models. These hypotheses may not be rigorously tested with many datasets, either because the datasets have not been 
gathered into an effective, accessible platform, or because the datasets require additional processing and information 
theoretic techniques to apply them to the model predictions for hypothesis testing. Further, as noted by Pfister and Kirchner 
(2017), hypothesis testing with models is fraught with challenges that require not only consideration of the data required to 
test a given hypothesis, but also careful consideration of how to encode hypotheses as uniquely falsifiable predictions 
(Figure 1). Advances in data science now allow the a" paradigm to inject “big data” into the scientific method using 


rigorous information theoretic methods without eliminating the other parts of the scientific method. 


Our focus here on scaling and similarity directs attention to one of the most challenging problems in the hydrologic sciences. 
As defined by Bléschl and Sivapalan (1995), scale is a “characteristic length (or time) of process, observation, model” and 
scaling is a “transfer of information across scales” (see also Bierkens et al., 2000; Grayson and Bléschl, 2000). Functional 
relationships between hydrologic variables may also exist and these may be scale-independent (or scale-invariant). 
Similarity is present when characteristics of one system can be related to the corresponding characteristics of another system 
by a simple conversion factor, called the scale factor. We should note that the terms ‘scaling’ and ‘similarity’ used here are 
specific to the hydrology literature and distinct from the general notions of self-similarity, fractals, and emergent behavior in 
the nonlinear dynamics literature. Classic examples of similarity include the ratio of catchment areas (Willgoose et al., 1991; 
Smith, 1992), and the topographic index In(a/tanB) (Beven and Kirkby, 1979) that are used for relating flows of two 
catchments and relating the topographic slopes and contributing areas to water table depths, respectively. Another example is 
the hillslope Péclet number (Berne et al., 2005; Lyon and Troch, 2007). Heterogeneity or variability in hydrology manifests 
itself at multiple spatial scales (e.g., Seyfried and Wilcox, 1995; Bléschl and Sivapalan, 1995), from local (O(1 m); e.g., 
macropores) to hillslope (O(100 m); e.g., preferential flowpaths) to catchment (O(10 km); e.g., soils) and regional (O(1000 


km); e.g., geology). Similarly, temporal variability is reflected on event, seasonal and decadal time scales (e.g., Woods, 
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2005). Understanding scaling and similarity requires understanding how the interactions among multiple processes across 
scales affect the (emergent) hydrologic behaviour at other space-time scales; such understanding underpins methods for 


computational simulation. 


The scaling and similarity problem is nevertheless very difficult. As asserted by Dooge (1986), “within the physical sciences 
and the earth sciences there is and can be no universal model for water movement.” Despite numerous attempts at 
integrating local models across soils (e.g. Kim et al., 1997), hillslopes (Troch et al., 2015) and watersheds e.g., (Reggiani et 
al., 1998, 1999, 2000, 2001), universal laws in hydrology and the required closure relations remain elusive because the 
physics are likely scale-dependent (e.g. Bierkens, 1996) and the data required to test these hypotheses are either not readily 
available or not easily synthesized, or, even worse, would never be observable (Beven, 2006). Further, computational 
advances have enabled so-called “hyper-resolution” or, using an alternative term that is not necessarily equivalent, 
“hillslope-resolving” modelling (e.g. Chaney et al., 2016; Wood et al., 2011), but as noted in the discussion between Beven 
and Cloke (2012) and Wood et al. (2012), and later discussed in Beven et al. (2015), the ability to provide meaningful 
information from hillslope-resolving models is limited both by a lack of tested parameterizations at a given model scale as 


well as by lack of data for model evaluation (e.g. Melsen et al., 201 6a). 


In principle, moving to finer spatial and temporal resolutions may improve accuracy simply by reducing the truncation error 
in the numerical solution of the system of partial differential equations. In an analogy with fluid mechanics and the 
atmospheric sciences where “large eddy simulations” are designed to capture the most energetic motions and thereby reduce 
the sensitivity to turbulence closure, one might ask whether “hillslope-resolving” models might resolve the most energetic 
components (in an information theoretic/entropy sense) of the terrestrial water storage spectrum such that the closure 
problem may be simplified. As discussed in many of the studies cited above, topography is fractal and this, combined with 
scaling between the pedon and the hillslope, drives much of the scaling behavior seen in hydrology. Most of the apparent 
fractal nature in relation to hydrology has been demonstrated at the scale of river networks (e.g. Tarboton et al, 1988), so a 
hypothesis that could be tested with data following the 4" paradigm is to what extent resolving these river networks in 
models reduces the information loss. Further, proposed scaling relationships may be appropriate above a given scale, but as 


we move downward in scales from watershed to hillslope to local, these relationships may break down. 


These current tactics in the hydrologic sciences are representative of the third paradigm of scientific investigation (Hey et al., 
2009), characterized by applying computational science to simulate complex systems. The so-called third paradigm builds 
on the earlier first (empirical) and second (theoretical) paradigms. As discussed by Clark et al., (this issue), computational 
science approaches to modeling hydrologic systems have been discussed for decades. With the advent of high-resolution 
earth observing systems (McCabe et al., this issue), proximal sensing (Robinson et al., 2008), sensor networks (Xia et al., 


2015), and advances in data-intensive hydrologic science (e.g., Nearing and Gupta, 2015), there is now an opportunity to 
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recast the hydrologic scaling problem into a data-driven hypothesis testing framework e.g., (Rakovec et al., 2016a). By 
embracing such a framework, hydrologic analysis can become explicitly “scale-aware” by testing specific parameterizations 


at a given model scale. Now is the time for a fourth paradigm in hydrologic science. 


With this goal in mind, this paper addresses the following questions: 


1. What are the key scaling and similarity concepts (hypotheses) that require testing? 
2. What framework could we use to test these hypotheses? 

3. What are the data requirements to test these hypotheses? and 

4. What are the model requirements to test these hypotheses? 


2 Scaling and similarity hypotheses 


Most scaling work to date has built on the Representative Elementary Area (REA) concept (Wood et al., 1988), and 
extensions to the Representative Elementary Watersheds (REW) introduced by Reggiani et al. (1998, 1999, 2000, 2001) — 
the REA/REW concept seeks to define physically meaningful control volumes for which it is possible to obtain simpler 
descriptions of the rainfall-runoff process (i.e., simpler than those at the point scale). An alternative, but related, concept is 
the Representative Hillslope (RH; Troch et al., 2003; Berne et al., 2005; Hazenberg et al., 2015). The REA/REW approach is 
conceptually similar to Reynolds averaging, and relies on the fundamental assumption that the physics are known at the 
smallest scale considered (e.g. Miller and Miller, 1956). Critically, the fluxes at the boundaries of the model control volumes 
require parameterization (the so-called “closure” relations). These closure assumptions are typically ad-hoc, and include sub- 
grid probability distributions, scale-aware parameters, or new flux parameterizations. Fundamentally, these approaches 
conform to the third paradigm, in the sense that they take as given a set of conservation equations that govern behaviour at 
the fundamental (patch, tile, grid, hillslope, or REW) scale (Figure 2). Testing both the scaling and closure assumptions as 


hypotheses using data would move hydrology towards the fourth paradigm. 


The examples above represent the classic “Newtonian” approach in hydrology, but the At paradigm advocated here is not 
specific to testing hypotheses derived from that approach, and as shown in Figure 1, represents an augmentation to the 
scientific method in hydrology. Foundational (Sivapalan, 2005; McDonnell et al., 2007) and more recent work (Thompson 
et al., 2011; Harman and Troch, 2014) on “Darwinian” hydrology has used scale and similarity concepts to synthesize 
catchments across scales, places and processes. As noted in McDonnell et al., (2007) there has been a call for a reconciliation 


of the Newtonian and Darwinian approaches, starting first in the ecology community (Harte, 2002), and we believe that 
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moving to a a paradigm with the augmented scientific method depicted in Figure | will embody the wishes of Darwin from 


his “Structure of Coral Reefs” as quoted in Harman and Troch (2014): 


“..In effect, what an immense addition to our knowledge of the laws of nature should we possess if a tithe of the facts 
dispersed in the Journals of observant travellers, in the Transactions of academies and learned societies, were collected 
together and judiciously arranged! From their very juxtaposition, plan, co-relation, and harmony, before unsuspected, would 
become instantly visible, or the causes of anomaly be rendered apparent; erroneous opinions would at once be detected; and 
new truths — satisfactory as such alone, or supplying corollaries of practical utility — be added to the mass of human 


knowledge. A better testimony to the justice of this remark can hardly be afforded than in the work before us.” 


An important avenue to advance hydrologic understanding and predictive capabilities is through attention on hypotheses of 
hydrologic scaling and similarity, i.e., different ways to relate processes and process interactions across spatial scales. One of 
the foundational works in hydrologic similarity is the topographic index (Beven and Kirkby, 1979) — the topographic index 
defines local areas of topographic convergence, and is used to relate the probability distribution of local water table 
fluctuations to catchment-average surface runoff and sub-surface flow. Building on this topographic similarity, this index 
was expanded to include soils and study runoff production (Sivapalan et al., 1987; Sivapalan et al., 1990), and further 
applied to examine scaling of evaporation (Famiglietti and Wood, 1994) and soil moisture (Wood, 1995; Peters-Lidard et al., 
2001). Such controls of water table depth on runoff production and evapotranspiration at catchment scales represent just one 
hypothesis of similarity and scaling behaviour — an example alternative hypothesis, used in the VIC model (Liang et al., 
1994), is the description of how sub-element variability in soil moisture affects the development of saturated areas in a 
catchment and the partitioning of precipitation into surface runoff and infiltration (Moore and Clarke, 1981; Diimenil and 
Todini, 1992; Wood et al., 1992; Hagemann and Gates, 2003). Other scaling hypotheses are used for other physical 
processes, for example, how small-scale variability in snow affects large-scale snow melt (Luce et al., 1999; Liston, 2004; 
Clark et al., 2011a), and how energy fluxes for individual leaves scale up to the vegetation canopy (de Pury and Farquar, 


1997; Wang and Leuning, 1998). 


The critical issue here is the interplay between the scale of the model elements and the choice of the closure relations: As 
computational resources permit higher resolution simulations across larger domains (Wood et al., 2011), more physical 
processes can be represented explicitly, and the closure relations must be tailored to fit the spatial scale of the model 
simulation. To some extent such “hyper-resolution” approaches abandon the quest for physically meaningful control 
volumes that characterizes the REA and REW concepts, and the representation of sub-element processes in fully 3-D 
simulation of watersheds (e.g., Kollet and Maxwell, 2008; Maxwell and Miller, 2005) is becoming less and less obvious, and 
perhaps less and less necessary. A key question now is whether “hyper-resolution” applications through explicit 3-D models, 


or (at least for some variables) with clustered 2-D simulations (e.g., the HydroBlocks of Chaney et al., 2016), provide 
5: 


10 


15 


20 


25 


30 


reasonable representations of scaling and similarity. Considering infiltration excess and saturation excess runoff generation 
processes, high resolution numerical studies indicate that excess infiltration doesn’t appear to have an ergodic limit (e.g. 
Maxwell and Kollet (2008), while excess saturation processes scale with the geometric of subsurface saturated hydraulic 
conductivity (e.g. Meyerhoff and Maxwell, 2011). Similarly, one might imagine different scaling relations for 
evapotranspiration depending on the nature of controls due to radiation (topography), vegetation, and/or soil moisture (e.g., 
Rigden and Salvucci, 2015). For example, as recently shown by Maxwell and Condon (2016), the interplay of water table 
depths with rooting depths along a given hillslope exerts different controls on evaporation and transpiration, which links the 
water table dynamics with the land surface energy balance, even at continental scales. This finding is based on limited data, 


and would benefit from formal hypothesis testing in an information-based framework, as described in the next section. 


3 A hypothesis testing framework for hydrologic scaling and similarity 


As demand increases for hillslope-resolving or “hyper-resolution” modelling (e.g., Beven et al., 2015; Beven and Cloke, 
2012; Bierkens et al., 2015; Wood et al., 2011, 2012), the question arises as to whether the physics in our models, the 
parameters that are used in the models, and the input data (e.g., “forcings”) are adequate to support such endeavours (e.g. 
Melsen et al., 2016b). Following from Nearing and Gupta (2015), we can formulate a framework for testing hypotheses 
based on measuring information provided by a model (e.g., parameterizations based on similarity concepts) as distinct from 
information provided to a model (e.g., forcing data or parameters). We should note that this is not hypothesis testing in the 
traditional sense, but rather a framework for scrutinizing hydrological scaling and similarity hypotheses with data. This 
concept was demonstrated by Nearing et al. (2016), who evaluated the information loss due to forcing data, parameters, and 
physics in the North American Land Data Assimilation System (NLDAS) model ensemble. In this example, information was 
first measured using point data for soil moisture and evaporation, and compared to regressions that are kernel density 
estimators of the conditional probability densities and represent the upper bound of information available on a given variable 
from the forcing data alone and given the forcing data and parameters. As shown in Figure 2, we can measure the total 
information about a given variable z contained in observations (H(z), left bar), and then measure the information about that 
variable provided by a given model simulation (I(z; y™), right bar). The intermediate bars represent losses of information due 


to forcing data (boundary conditions) and due to parameters. 


If we take this example, and expand it to conceptualize a framework for hypothesis testing in hydrology, we can imagine 
multiple instances of H(z) computed at different spatial scales, as well as multiple instances of mutual information I(z, y™), 
computed for models employing different representations of processes at that scale. One concrete example hypothesis 
described in the previous section is the use of TOPMODEL parameterizations for groundwater, versus representative 


hillslopes, versus “HydroBlocks” (Chaney et al., 2016) versus explicit 3-D modeling. 
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Critical to this exercise is the availability of forcing data, such as precipitation, radiation, humidity, temperature and wind 
speed, that have sufficient information content at the scale being evaluated such that it can adequately characterize the 
variable (e.g., soil moisture) or process (e.g., evapotranspiration; runoff) being studied (e.g. Berne et al., 2004). Similarly, 
the parameters provided to the model must also contain information about the variable or process being studied at a 
particular spatial and temporal scale. The Nearing and Gupta approach provides a framework for explicitly measuring the 
information available from observations, comparing that to information provided by a model, and attributing lost information 
to forcings, parameters and physics, and hence provides a rigorous method to test our physics assumptions by confronting 


them with observations. Clearly, this leads to requirements for data that can support such framework. 


4 Data requirements 


As shown in Figure 1, the 4" paradigm for hydrology is characterized by the rigorous application of large datasets towards 
testing hypotheses as encapsulated in models. The process of constructing models requires observations both as input data, 
and for model and process validation or hypotheses testing. A distinguishing characteristic of data for model and process 
validation will be that we are observing spatial and temporal patterns of fluxes and states represented in our modeling 
framework, for example, soil moisture, snow pack or evapotranspiration. As discussed by McCabe et al. (this issue), there 
has been a dramatic increase in the type and density of hydrologic information that is becoming available at multiple scales, 
from point- to meso-scale and regional to global. For example, the number of remote sensing missions dedicated to 
observing the water cycle, allows further development of (large scale) hydrological models and data assimilation frameworks 
for more accurate soil moisture, evaporation, and streamflow prediction. In particular, there are exciting developments in 
meso-scale (i.e. hillslope to catchment) observations, which are critical for testing hypotheses about scaling (REA, RH, 
REW) by connecting point measurements, hydrological models, and remote sensing observations. Examples include recent 
advances in cosmic ray neutron sensors (Franz et al., 2015; Kohli et al., 2016; Zreda et al., 2008), distributed temperature 
sensing (DTS; Steele-Dunne et al., 2010; Bense et al., 2016; Dong et al., 2016), soil moisture observations, the use of crowd- 
sourcing (De Vos et al., 2016) and microwave signal propagation from telecommunications towers for precipitation (Leijnse 
et al., 2007), to the rise in the use of unmanned autonomous vehicles to characterize the landscape at centimeter scale 
(Vivoni et al., 2014). These alternative data sources enhance our ability to observe, understand, and simulate the 
hydrological cycle. Advances in citizen science (Buytaert et al., 2014; Hut et al., 2016) and the use of so-called “soft” data 
for hydrological modeling (Van Emmerik et al., 2015; Seibert and McDonnell, 2002) show that even though these new data 
are collected on nontraditional spatiotemporal scales, they might give us new insights in how processes at different scales are 


coupled. 
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Advances in hydrogeophysical characterization of the subsurface (Binley et al., 2015), such as electrical methods, ground 
penetrating radar and gravimetry, offer non-invasive meso-scale information that can be used to provide parameters or to 
infer boundary conditions, states or fluxes. Recently, Christensen, et al. (2017) demonstrated that dense airborne 
electromagnetic data can be used to map hydrostratigraphic zones, which is an encouraging capability. Imaging the subsoil 
may be feasible at local scales, but it is a challenge at river basin or continental scales. Hence, we encourage more joint 


efforts in hydrogeophysical imaging for integrated characterization of the subsurface. 


Combined, these observations may be used in a benchmarking exercise similar to Nearing et al. (2016). Synthesizing 
hydrogeophysical methods with point observations and laboratory/field techniques for estimating "effective" soil hydraulic 
functions/parameters is a challenging opportunity (e.g. Kim et al., 1997), but one which might be tractable using a data- 
driven hypothesis testing framework. These new data sources allow us to understand and apply scaling between data sources 
(point scale to remotely sensed data) and between model scales; and provide the critical data required to test alternative 


scaling hypotheses. 


Beyond the new meso-scale observations, extensive catchment databases now exist to support hypothesis testing including 
the TERENO (Zacharias et al., 2011), MOPEX (Duan et al., 2006), CONUS benchmarking (Newman et al., 2015a), GRDC 
(http://www.bafg.de/GRDC/EN/01_GRDC/13_dtbse/database_node.html) and EURO-FRIEND databases (Stahl et al., 
2010). Recent similarity studies (Sawicz et al., 2011) have systematically analyzed large numbers of catchments focusing on 
streamflow-oriented signatures such as the runoff coefficient, baseflow index and slope of the flow duration curve, and then 
have explored relationships between these signatures and model process time scales (Carrillo et al., 2011). Coopersmith et al. 
(2012) generalized this work with four nearly orthogonal signatures that included aridity, seasonality of rainfall, peak 
rainfall, and peak streamflow, and demonstrated that 77% of MOPEX catchments can be described by only six classes 
defined by combinations of the four signatures. Clearly there is information contained in these catchment databases about 
not just the coevolution of climate (forcing) and landscape properties (parameters), but also the physics of the catchment 
responses. Comparative hydrology (e.g., Kovacs, 1984; Falkenmark and Chapman, 1989; Gupta et al, 2014) takes a first 
needed step in the direction of the fourth paradigm, and following the framework described above, we can explicitly quantify 
the mutual information in the signatures, parameters and forcings to help elucidate these connections beyond classification. 
One of the crucial factors that complicate scaling is the anthropogenic effect on catchments. Recent advances in modeling 
the co-evolution of the human-water system (see e.g. Troy et al., 2015; Ciullo et al., 2017) focused on identifying generic 
key processes and relations. Yet, it is unknown how these relate to systems on larger (and smaller) scales. To arrive at new 
understandings of scaling and similarities in human-influenced catchments, studying these issues from a socio-hydrological 


point of view should be an integrated part of the way forward (e.g. Van Loon et al., 2016). 
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5 Modeling framework requirements 


Embracing the fourth paradigm in hydrology will face several challenges. First, it is necessary to implement/extend a 
hydrologic modelling framework with sufficient flexibility to evaluate competing hypotheses of similarity and scaling 
behavior (Clark et al., 2011b). One possible framework is the Structure for Unifying Multiple Modeling Alternatives 
(SUMMA), recently introduced by Clark et al. (2015), which has the capability to incorporate alternative spatial 
configurations and alternative flux parameterizations. Frameworks like SUMMA, which pursue the method of multiple 
working hypotheses, enable decomposing complex models into the individual decisions made as part of model development, 
and focusing attention on specific decisions (e.g., related to scaling and similarity) while keeping all other components of a 
model constant, hence enabling users to isolate and scrutinize specific hypotheses. One confounding issue is that models 
with parameterizations designed to represent sub grid processes may not add information in a manner proportional to 
increased information in the inputs, while models that have a single column tile / subtile form may show a more direct 
relationship between information in inputs and information in outputs. Similarly, integrated models with lateral flow of water 
in surface and subsurface systems that generate runoff directly will have a different spatial sensitivity to the resolution of the 
input data than more traditional land surface models with no lateral flow and a parameterized runoff generation. Hence, the 
modeling framework must be able to isolate the role that surface and subsurface connectivity play in processing information 


at different scales. 


A second challenge consists of understanding how to deal with different uncertainties/errors of different observational 
products and hydrologic models when comparing them for studying the scaling behavior. Several papers have highlighted 
the problem of different climatologies or sensitivities of remote sensing products (e.g. Albergel et al., 2012; Brocca et al., 
2011), gridded meteorological products (Clark and Slater, 2006; Newman et al., 2015b), and streamflow observations (Di 
Baldassarre and Montanari, 2009; McMillan et al., 2010). A true correspondence of these remotely sensed variables with 
model results is often hampered, due to vertical mismatches in the soil column between the different products (Wilker et al., 
2006), approximations in the structure of the hydrological model used, its parameterization and discretization, the initial 
conditions, and errors in forcing data (De Lannoy et al., 2007). Because of this, modeled variables often do not correspond 
well to observations; nevertheless, similar trends and dynamics between the different products are found (Koster et al., 


2009). 


In several data assimilation studies, the problem of differences in climatologies is resolved by bias-correcting the 
observations towards the model (e.g. Crow et al., 2005; Kumar et al., 2014; Lievens et al., 2015a, 2015b; Martens et al., 
2016; Reichle and Koster, 2004; Sahoo et al., 2013; Verhoest et al., 2015). Yet, such (statistical) operations may not be 
appropriate for scaling studies. First of all, these methods only rescale the remotely sensed value, yet the uncertainties in 


these products need rescaling as well. Second, depending on the bias-corrections method used (ranging from only correcting 
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for the first moment to full CDF matching) different scaling relations may be found. Ideally, multiscale data should be used 
in a way that best demonstrates the ability of the models to reproduce processes at the scales at which those data are 
available, particularly with respect to reproducing attributes of dynamics, such as the time rate of decorrelation using an 


information metric, and the mutual information across variables, space and time. 


Testing hypotheses with multiple scale information also require assimilation/modeling frameworks that allow integrating 
data into models at their native resolution so that simulations and observations can be compared without the need of 
introducing ad-hoc downscaling/upscaling rules. One such framework has recently been proposed by Rakovec et al. (2016b). 
This framework uses the multiscale parameter regionalization (MPR) (Samaniego et al., 2010) technique to link the 
resolutions of the various data sources with the target modeling resolution and keeping a single set of model transfer 
parameters that are applicable to all scales. As a result, seamless, flux-matching simulations can be obtained. The MPR- 
based assimilation framework proposed by Rakovec et al. (2016b) is general and can be used within any land surface or 
hydrologic model. This framework was originally tested with mesoscale hydrological model (mHM) (Kumar et al., 2013; 
Samaniego et al., 2010) in order to test hypotheses related to model transferability across scale and locations as well as 
process description. This data assimilation approach is general and can be used—for example within the SUMMA (Clark et 
al., 2015) modeling framework—to test hypothesis related with the appropriate model complexity at a given scale. A model 
agnostic MPR system called MPR-flex has been recently applied to the Variable Infiltration Capacity (VIC) model to 
estimate seamless parameter and flux fields over CONUS (Mizukami, N., Clark, M., Newman, A., Wood, A., Gutmann, E., 
Nijssen, B., Samaniego, L. Rakovec, under review). This symbiosis of model parameterization (MPR-Flex) and simulation 
frameworks (e.g., SUMMA, mHM, etc.) is a very promising avenue to test scaling laws as well as the uncertainty 
decomposition described above. Finally, the issue of subjective modeling decisions (e.g. the choice of time step, spatial 
resolution, numerical scheme, study region, time period for calibration / validation, performance metrics, etc.) and associated 


uncertainties is an issue that requires further attention (e.g. Krueger et al., 2012). 


6 Summary and Next Steps 


In this paper we review advances in hydrologic scaling and similarity. Beginning with the challenge of Dooge (1986), we 
posit that roadblocks in the search for universal laws of hydrology are hindered by our third-paradigm approach, and assert 
that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Building on other synthesis papers in 
this issue (Clark et al., McCabe et al.), advances in data-intensive hydrologic science (e.g., Nearing and Gupta, 2015) have 
laid the foundation for a data-driven hypothesis testing framework for scaling and similarity. To achieve this goal, we have 
(1) summarized important scaling and similarity concepts (hypotheses) that require testing; (2) described a mutual 


information framework for testing these hypotheses; (3) described boundary condition, state/flux, and parameter data 
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requirements across scales to support testing these hypotheses, and (4) discussed some challenges to overcome while 


pursuing the fourth hydrological paradigm. 


Figure | illustrates the concept that embracing a 4" paradigm in hydrology where we enable a rigorous confrontation of our 
hypotheses embodied within our models with a range of data types across many locations and spatial-temporal scales. This 
paradigm represents a union and extension of previous scientific methods within a formal hypotheses driven framework. 
Models are a synthesis of all what we have learned (e.g., conservation equations; constitutive relationships for soil 
infiltration) and data, particularly through first paradigm examples like comparative hydrology, yields empirical 
relationships, signatures, fingerprints that helps lead to new understanding and theory Oi paradigm). By coupling 
traditional (e.g., in situ) and new data sources (e.g., satellites) we can use the power of information theory and rigorous 
hypothesis testing to elucidate the causes for behaviours that may not be evident in the analysis of individual sites or 
catchments. In this sense, a move to the 4th paradigm means that we seek modelling-driven monitoring, and simultaneously, 
monitoring-driven modelling. The formal hypotheses driven framework will indicate where we have weak processes 
understanding because we cannot explain the data obtained at high resolution. In other cases, comprehensive integrated 
simulations and big-data relationships would allow the identification of where the measurement errors are too large (i.e. data 
has little information content, entropy) and point out what kind of sensors or new measurements/sensors are needed to 
improve our physical understanding. These are the feedback loops in Figure 1, and these represent two important paths to 


optimizing the use of models and data to enhance hydrologic science. 


As a next step, we propose a focused community effort to shape the development of the fourth paradigm for hydrology. To 


this end, a workshop following the publication of this special issue would be a good first step. 
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Figure 1: An illustration of the scientific method in hydrology, highlighting how each component of the method reflects the 
various paradigms of science. The 4" paradigm is characterized by advanced data collection and analysis, as noted in the green 
boxes. Based on Figure 1 in Clark et al., 2016. 
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Figure 2: Aggregation and scaling schematic following Wood (1995). 
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Figure 3: A conceptual diagram of uncertainty decomposition using Shannon information following Nearing et al., (2016). The 
term H(z) represents the total uncertainty (entropy) in the benchmark observations, and I(z; u) represents the amount of 
information about the benchmark observations that is available from the forcing data. Uncertainty due to forcing data is the 
difference between the total entropy and the information available in the forcing data. The information in the parameters plus 
forcing data is I(z; u), and I(z; u, 0)<I(z; u) because of errors in the parameters. The term I(z; y”) is the total information available 
from the model, and I(z; y)<I(z3 u, 0) because of model structural error. 
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